Introduction

Abstract

This project aims to investigate the relation of police incident records, time, and location in San Francisco, California during 2018-2020, through analysis using R Markdown with packages, including readr, dplyr, ggplot2, ggrepel, and leaflet. Generally speaking, an obvious decrease since early 2020 can be observed in Assault, Larceny Theft, Lost Property, Non-criminal, Other Miscellaneous, Robbery, and Warrant. However, Burglary, Motor Vehicle Theft, and Recovered vehicles have experienced some increase since early 2020. Crimes are mostly reported during afternoon hours, especially around 5-8 PM weekdays and around noon all days, and there are many crimes reported around 12 AM during weekends.In conclusion, the factor that affects when the crime or arrest happens the most is the crime category, while other factors such as month and police precinct do not have much impact on when the crimes happen.

Overview and Motivation

This project aims to investigate the relation of police incident records, time, and location in San Francisco, California during 2018-2020 based on the San Francisco police open data. Since crime has been a major social concern of urban areas when these unlawful activities happened and reported can be important to San Francisco as a major city of U.S. Although the reason why the crime happened at a certain time will not be revealed in this project, assumptions may be provided. The dataset contains police records in San Francisco since 2018 but only data from 2018 to 2020 will be included in the related analysis. Some records with too many missing critical values will be automatically removed from the analysis. These police records are used as an indicator of unlawful activities, which may be referred to as crime in the following.

Questions:

What’s the trend of the daily police records in San Francisco? Which areas have crime happened more frequently than others? Which types of crime frequently happen? Is there a variation between different kinds of crimes? What time does theft happened frequently? What are the possible factors? What are the possible factors for when the crime happened and how they correlated? Is there a correlation between the number of incidents and the crime category? Is there a correlation between the number of incidents and the police districts?

Data Preprocessing

Preprocess the data

The original column names are a bit redundant. This step is to simplify the column names that may be used in the following analysis.

library(tidyverse)
library(dplyr) 
df <- df %>% rename(Date=`Incident Date`,
                     Time = `Incident Time`,
                     Year = `Incident Year`,
                     DayOfWeek = `Incident Day of Week`,
                     Category=`Incident Category`,
                     Descript=`Incident Description`,
                     PdDistrict = `Police District`,
                     Y = Latitude,
                     X = Longitude) %>% 
              mutate(Time = as.character(Time))

 # str(df)

Data Overview

Daily record Trend

The graph indicates the daily police records number decreased dramatically during 2020. (probably because of the Covid-19, everyone just stayed at home) The months with the lowest daily number of police records are around March to April 2020 when the government had released the quarantine announcement. Generally, the number of daily police records after 2020 is much lower than the previous.

### Interactive Map of Crime Incidents This map shows the locations of crime incidents during 2018-2020. Clicking on pop-up icons on the map can show incident details. There’s a lot of incidents in the Tenderloin and Mission, as well as in Bayview and Fisherman’s Wharf. The legendary Tenderloin district has the highest concentration of crimes, which spreads north of it to Downtown. Once passing Market St. to the south and the US101 to the west, the number of incidents decreases dramatically. Another noteworthy pattern is that incidents decrease as going east or north from the Tenderloin towards the Nob Hill and Financial District districts, but increases when reaching the waterfront.

Data Aggregation

Summarize the data by incident category. According to the list of the most frequent record categories, Larceny Theft takes about 30% of all records, being the category with the largest percentage. The top 20 frequent categories are, (starting from the most frequesnt), Larceny Theft, Other Miscellaneous, Malicious Mischief, Non-criminal, Assault, Burglary, Motor Vehicle Theft, Recovered Vehicle, Warrant and Lost Property.

## # A tibble: 20 x 3
##    Category                                 Frequency Percentage
##    <chr>                                        <int>      <dbl>
##  1 Larceny Theft                               142622     30.2  
##  2 Other Miscellaneous                          34758      7.35 
##  3 Malicious Mischief                           31074      6.57 
##  4 Non-Criminal                                 29055      6.14 
##  5 Assault                                      28308      5.99 
##  6 Burglary                                     26585      5.62 
##  7 Motor Vehicle Theft                          21894      4.63 
##  8 Recovered Vehicle                            17141      3.62 
##  9 Warrant                                      15508      3.28 
## 10 Lost Property                                14690      3.11 
## 11 Fraud                                        14235      3.01 
## 12 Drug Offense                                 11484      2.43 
## 13 Robbery                                      11071      2.34 
## 14 Missing Person                               10616      2.25 
## 15 Suspicious Occ                                9440      2.00 
## 16 Disorderly Conduct                            8023      1.70 
## 17 Offences Against The Family And Children      6602      1.40 
## 18 Traffic Violation Arrest                      5526      1.17 
## 19 Miscellaneous Investigation                   4456      0.942
## 20 Other Offenses                                4009      0.848

This pie chart shows the percentage that each record category takes in all records.

The following is a bar plot of incident categories with high frequency. Record number by date and category: According to the graph, Burgary had a peak around May 2020 and it has increased a bit since early 2020; Larceny Theft has the largest variation and it has had the similar dramatic decrease to the all incident records trend since early 2020; Lost Property also has expereinced some decrease since earily 2020; Warrant had a few peaks during 2019. Generally speaking, obvious decrease since early 2020 can be observed in Assault, Larceny Theft, Lost Property, Non-criminal, Other Miscellaneous, Robbery and Warrant. However, Burglary, Motor Vehicle Theft and Recovered Vehicle have expereinced some increase since early 2020.

Correlation Analysis for crime reports

The following graphs aims to explain the relation between Time/Day of the Week and incident records

Factor by Crime Category

To further discuss crime category as a factor of time of crime reported, display the heatmap by crime category: Some criminal activities, like prostitution, might presumably occur predominantly at night. Faceting is a technique in ggplot2 that allows such analysis easier by producing a graph for each case of a separate value in a different variable. In just this scenario, a heatmap shows each of the top categories of incidents using ggplot2, then check if there is any meaningful change in the heatmap.

As shown in Fig. 9, the data has too many Larceny Theft records to show the time distribution of other categories, thus, the data need normalization. After implementing normalized percent inhibition, Fig. 10 shows some interesting patterns. Burglary mostly happens around 5 PM on Friday, after midnight around 2-3 AM, while other incident categories are least likely to happen around 2-3 AM. A high percentage of assault incidents are around 12-2 AM during weekends. Drug Offense mostly happens during 2-4 PM on Tuesdays and 1-3 PM on Wednesdays. Fraud is the incident category that has the most records at around 12 AM and 12 PM, which could be because that the fraud could have a large number of victims but a small number of suspects, and the police can input each victim as an individual record. A larger number of Larceny theft happened between 6 PM to 7 PM during weekdays. Lost property has most records around 12 AM on Fridays, Saturdays and Sundays, and around 12 PM on Saturdays. There is a high percentage of Motor Theft incidents at around 5 PM on Fridays or around 5-6 PM on weekdays.The recovered vehicle incidents records are around 9-12 AM on Mondays and Tuesdays. In general, most incidents are around daytime with some special categories happen more often during midnight.

Factor by Police District

Same as above, but with Police Districts: Fig 11 displays the time and day of the week of incident records in each police district. most of them are around daytime on weekdays and around midnight on weekends. There are no significant different patterns among different police districts.

Factor by Month

If crime is tied to activities, the period of the year at which special activities, such as holiday seasons, may impact. Months are reorder using the factor function. However, according to Fig.12, each month has similar patterns to each other, which means there is likely little relation between the month and when the incidents happen.

Factor By Year

In case that things changed over the years may affect when these incidents happen, the year is also analyzed as a factor. In 2020, there are more incident records at around midnight, but fewer during the late afternoon during weekdays, compared to weekdays in 2018 and 2019.

Conclusion

Based on the graphs above, in general, there is a smaller number of incidents happening during the nighttime during weekdays than on weekends. Most incidents happen during the afternoon on weekdays. crime category can be an important factor of when crime happens since various crime categories show different patterns of time of incidents. Crimes may be affected by year if there is a disaster lasting over a year, such as Covid-19. However, when crimes happen may not have a relation with the month or police districts.